[00:31.100 --> 01:02.290]  A little pause... and we are back.
[01:02.290 --> 01:02.970]  A little pause... and we are back.
[01:42.440 --> 01:46.540]  I guess we can get started here. It's about ten o'clock, or close enough.
[01:48.580 --> 01:52.520]  I'm Dan Burrows, and I'm going to be talking about my research...
[01:55.120 --> 02:01.700]  I'm Dan Burrows, and I'm going to be talking about my research in the identification of distributed coordinated network attacks.
[02:01.700 --> 02:06.960]  Before I get started, they asked me to make a couple announcements just about the speakers that are coming up next.
[02:06.960 --> 02:14.320]  There's Dr. Ian Goldberg, who's going to be talking about arranging anonymous rendezvous immediately after this speak.
[02:14.320 --> 02:22.060]  And then JBL, a bit later on, talking about attacking and securing Red Hat Linux and evaluating the effectiveness of Bastille Linux.
[02:22.660 --> 02:26.380]  And the other thing they told me to mention was that it's really, really hot in here,
[02:26.380 --> 02:31.900]  so you could do yourself some good to keep drinking plenty of water or whatever your beverage of choice is.
[02:35.450 --> 02:38.710]  Oh, that's...
[02:39.510 --> 02:41.470]  They're working on the other screen.
[02:43.530 --> 02:54.170]  So, I'm a graduate student at Dartmouth College, and I work as a research engineer at the Institute for Security Technology Studies that's housed up there.
[03:00.280 --> 03:03.620]  Okay, just a brief overview of what my talk's going to be about.
[03:03.620 --> 03:07.980]  I'm going to start off, give you a bit of background on what ISTS is and what we're doing there.
[03:08.160 --> 03:14.140]  A overview of what we're trying to accomplish with this distributed intrusion detection analysis.
[03:14.140 --> 03:20.840]  And then getting into a bit of the details about the basics of information warfare theory that we're applying to this.
[03:21.060 --> 03:23.380]  Bayesian multiple hypothesis tracking.
[03:23.400 --> 03:27.520]  How these two things apply to intrusion detection.
[03:27.760 --> 03:34.460]  And then, time permitting, see how we're doing for time, going to talk a little bit about other research projects that are going on at ISTS.
[03:36.220 --> 03:42.280]  So, I'm basically going to give kind of a high-level overview of what's going on with our system.
[03:42.280 --> 03:47.800]  Since it's kind of a short amount of time, not get into the real gory details of everything.
[03:49.380 --> 03:51.860]  So, a bit about ISTS.
[03:52.860 --> 04:01.500]  It's a counter-terrorism research institute that was started up about... it's been about a year and a half that we've been up and running now.
[04:01.500 --> 04:05.520]  At least parts of it are housed at Dartmouth College.
[04:05.520 --> 04:14.060]  The three main sections are the Cyber Security Division, Infrastructure Protection, which is the area that I work in,
[04:14.060 --> 04:20.440]  and Chemical Biological Threats, that is work being done over at the medical school at the college.
[04:20.740 --> 04:29.620]  We're funded mainly through the National Institute for Justice, which is the research arm of the Department of Justice.
[04:30.620 --> 04:38.300]  That's not our sole source of funding, but they're the ones who set up the institute and got things going.
[04:41.660 --> 04:49.040]  So, the current research at ISTS. There's of course the Distributed Intrusion Analysis, which is the area that I'm working in.
[04:50.440 --> 05:01.540]  Quantitative Security Risk Analysis, which is a formalization of methods of evaluating the risk and cost-benefit factors of a network,
[05:01.540 --> 05:06.680]  figuring out how to best allocate your resources and how to best allocate your money in improving your network,
[05:06.680 --> 05:13.140]  and for evaluating the security of a network for purposes such as insurance and other issues like that.
[05:13.140 --> 05:20.000]  The Forensic Tool Development, which, one of the interesting things at ISTS, when you approach security there,
[05:20.000 --> 05:25.500]  since we're funded by the Department of Justice, security isn't really their main goal.
[05:25.560 --> 05:34.200]  They're the law enforcement division of the government, so they're interested more in gathering evidence, gathering information,
[05:34.200 --> 05:41.120]  to be used basically to catch and to prosecute people, as opposed to securing things to keep people out of it.
[05:41.120 --> 05:49.440]  So the Forensic Tool Development is work that's being done in how to deal with, once information has been seized,
[05:49.440 --> 06:00.500]  either hard drives, computers, how to go through and do analysis on the files, file reconstruction from partially deleted files, and areas such as that.
[06:00.500 --> 06:07.920]  Semantic hacking, which is, as opposed to breaking in and just defacing a website where someone will get onto some site,
[06:07.920 --> 06:19.620]  usually a news site, maybe information about companies or stock prices, and alter the information in order to manipulate it for some sort of financial gain.
[06:21.440 --> 06:26.060]  That's one of the projects, the kind of general internet health monitoring,
[06:26.060 --> 06:30.720]  being able to easily and rapidly tell where there are problem areas, starting on the internet,
[06:30.720 --> 06:36.840]  where segments of it might be going down or might be experiencing strange or unusual amounts of traffic.
[06:37.260 --> 06:45.100]  The Security Informant, which is a security news service and kind of monitoring service,
[06:45.100 --> 06:49.580]  not limited just to computer security, but just general security issues.
[06:49.940 --> 06:58.240]  And finally, the User Mode Linux Honeynet Project, which for any of you who might have been at the Black Hat presentations a few days ago,
[06:58.240 --> 07:04.340]  I believe Lance Spitzner talked about it. I wasn't there, but I think he talked a bit about the Honeynet Project,
[07:04.340 --> 07:09.680]  which is a series of computers that are being distributed to networks all over the country,
[07:09.680 --> 07:16.000]  that are basically there just to be hacked into so we can get a look at what people are doing,
[07:16.000 --> 07:22.060]  what they do when they get into the systems, and have a way of looking at the people attacking our systems.
[07:22.060 --> 07:29.980]  User Mode Linux is a tool, it's similar to VMware, where it allows you to, within one Linux box,
[07:29.980 --> 07:37.300]  you can run many instances of Linux within it, and it does this by intercepting kernel calls and redirecting them,
[07:37.300 --> 07:42.760]  and from the outside it can appear as if an entire network is running inside of one box,
[07:42.760 --> 07:45.620]  each having its own IP address, having different functionality.
[07:45.620 --> 07:51.120]  It was originally developed as a development tool and a testing tool,
[07:51.120 --> 07:56.220]  but it's now being turned over and being used for this Honeynet Project.
[07:58.100 --> 08:02.260]  So, on with the distributed IDS analysis.
[08:05.080 --> 08:12.340]  The objective of this is to identify related intrusions and attempts at intrusion across many networks,
[08:12.340 --> 08:14.860]  and identify coordinated attack efforts.
[08:14.860 --> 08:21.800]  Whether this is one person or one group that is going out and using similar methods to attack many different networks,
[08:21.800 --> 08:25.820]  or whether it's a number of people, or at least a number of different sources,
[08:25.820 --> 08:32.780]  that are all attacking a particular site or a particular set of sites for some purpose,
[08:32.780 --> 08:38.980]  to try and pull these out of intrusion detection system reports that are being generated across many networks,
[08:38.980 --> 08:44.060]  and figure out a higher level situational awareness of what's really going on,
[08:44.060 --> 08:47.080]  and what the goals of the attackers are.
[08:47.080 --> 08:53.460]  And this is being done through the use of distributed intrusion detection systems,
[08:53.460 --> 08:58.780]  and the application of information warfare theory and multiple target tracking algorithms.
[09:01.310 --> 09:03.630]  So, what is the need for this?
[09:03.630 --> 09:08.690]  Like I said before, ISDS is a counter-terrorism research institute,
[09:08.690 --> 09:12.570]  and the area that I work in is an infrastructure protection group.
[09:13.130 --> 09:21.170]  And this is basically a research center that's been designed to develop technology to aid groups,
[09:21.170 --> 09:25.910]  such as the NIPC and other infrastructure protection groups,
[09:26.150 --> 09:31.810]  to the ones that are concerned about coordinated attacks against infrastructure systems,
[09:31.810 --> 09:38.810]  such as whether they be communication systems, power generation systems, the internet itself, things such as that.
[09:38.810 --> 09:46.130]  And possible either attacks against a particular segment of infrastructure on a wide area,
[09:46.130 --> 09:51.890]  or attacks against multiple segments of infrastructure within a limited geographic area.
[09:52.350 --> 09:59.710]  And the other area that this is being applied to is in the early detection of distributed denial of service attacks.
[09:59.710 --> 10:08.850]  Because, as I'm sure everybody knows, the process to setting up a distributed denial of service attack is a long process with a slow build-up,
[10:08.850 --> 10:15.850]  that usually goes fairly undetected, or at least the people who detect it are usually protected by it,
[10:15.850 --> 10:20.490]  and they're happy, and don't get broken into while others do.
[10:20.490 --> 10:23.590]  And eventually, all the machines that are captured, all the zombie machines,
[10:23.590 --> 10:30.070]  are used to attack some victim who potentially has not seen any warning of this attack beforehand.
[10:30.070 --> 10:43.690]  So, we're hoping to be able to do some early detection of what someone is doing when they're going out and attempting to collect the zombies in preparation for a DDoS.
[10:49.070 --> 10:56.250]  Okay, this system, basically what we're doing, we're looking at data that's already being collected out there by intrusion detection systems.
[10:56.250 --> 11:01.810]  We're not interested in developing new IDS or new ways of collecting data,
[11:01.810 --> 11:05.570]  but use the existing structures that are out there, the existing systems,
[11:05.570 --> 11:11.170]  and to gather this data and look at it in a new way to develop a higher level awareness of this.
[11:11.170 --> 11:14.450]  And we're doing this through sensor data fusion.
[11:17.490 --> 11:25.850]  We're correlating data being gathered from the IDS in various ways over time and space
[11:26.230 --> 11:31.110]  in order to develop a higher situational awareness of what's happening.
[11:31.110 --> 11:34.370]  It's not an intrusion detection system in and of itself.
[11:34.370 --> 11:37.790]  It's using the information being provided by the IDS,
[11:37.790 --> 11:41.850]  and using the IDS as kind of a first level filter on what's happening,
[11:41.850 --> 11:46.990]  where the IDS looks at your network traffic or things that are happening on a host computer
[11:46.990 --> 11:54.750]  and filters this information down into the various reports about what it thinks are happening.
[11:54.750 --> 12:03.610]  We're interested in taking those reports and figuring out what the next level bigger picture is happening from these reports.
[12:03.610 --> 12:13.310]  In a way, it's the same job that what a lot of security admins or security analysts are doing.
[12:13.310 --> 12:16.650]  And this is not an attempt to replace them with this,
[12:16.650 --> 12:21.250]  but just to give them another tool to have a better picture of what's going on.
[12:25.130 --> 12:34.770]  This is kind of the typical view on a network when you have a few different attackers coming in and attacking you.
[12:34.770 --> 12:38.850]  That you have your network, and that's kind of what you're concerned about protecting.
[12:39.010 --> 12:44.450]  And when someone is doing some form of distributed attack,
[12:44.450 --> 12:49.050]  you see the portion of the attack that coincides with your network space.
[12:49.130 --> 12:54.650]  And you don't see the rest of the attack, you just see what you're concerned with, basically.
[12:54.650 --> 12:57.810]  And this approach, it's a defensive approach, but it makes sense.
[12:57.810 --> 13:01.570]  This is what computer security is all about.
[13:01.570 --> 13:07.170]  Fortifying the defenses, building the moat around the castle, putting up the big walls, and keeping people out.
[13:09.130 --> 13:16.110]  But one of the things that's happening here is we're actually getting a lot of information that's being gathered at this point,
[13:16.110 --> 13:20.230]  and it's not being used as effectively as it could be.
[13:20.230 --> 13:22.930]  It's being used to keep people out of individual networks,
[13:22.930 --> 13:27.990]  but not to understand really what the people are doing and what their goals might be.
[13:27.990 --> 13:33.190]  So, we're trying to move to more of an attacker-centered view,
[13:33.190 --> 13:37.150]  to where, by looking at the information being gathered off of many networks,
[13:37.150 --> 13:45.350]  we first gather the data, gather what portions of the attacks you see from all these networks,
[13:45.350 --> 13:53.230]  and then it kind of comes in as a big chunk of data and then break this back apart
[13:53.230 --> 14:00.470]  in order to figure out what parts of this belong to each individual attacker or group of attackers,
[14:00.470 --> 14:03.690]  so we can get a better picture of what the attackers are doing,
[14:03.690 --> 14:06.930]  as opposed to the picture of what is happening to our network.
[14:11.810 --> 14:17.270]  So, the equations that were shown, I'm just kind of skipping over them briefly,
[14:17.270 --> 14:22.270]  were just a formalization of showing what information could be gathered out of the system,
[14:22.270 --> 14:24.910]  and they represent the best possible case.
[14:25.690 --> 14:30.210]  There are obviously limiting factors in that we, on any network,
[14:30.210 --> 14:34.050]  we don't see all the... it's rare that we'd ever see all the activity on it.
[14:34.050 --> 14:39.130]  A lot of intrusion detection systems are looking at things that are coming in and out across your perimeter,
[14:39.970 --> 14:44.690]  or in and out of certain segments of the network.
[14:46.050 --> 14:51.770]  But, you know, oftentimes there's a lot of communication and traffic that is entirely within your network
[14:51.770 --> 14:54.230]  that you're not looking at because you're not too worried about it.
[14:54.230 --> 14:58.330]  So, you're never going to see everything.
[14:58.650 --> 15:03.830]  And the other thing is that the sensors, the IDS systems, and any other type of network sensor,
[15:03.830 --> 15:07.030]  whether it be anything that's gathering information about what people are doing,
[15:07.030 --> 15:12.370]  firewalls, routers, etc., etc., they're not perfect, and they miss things,
[15:12.370 --> 15:16.610]  they make false reports, they have false positives.
[15:16.810 --> 15:23.670]  So, the previous view was kind of idealistic, and it's not really exactly what you're going to see.
[15:25.030 --> 15:30.270]  So, as I said before, it's kind of a two-step process where first we bring together all the data
[15:30.270 --> 15:36.390]  from across many networks, and then we break it apart based on...
[15:36.910 --> 15:40.170]  instead of breaking it apart, it comes in on a network-by-network basis,
[15:40.170 --> 15:44.890]  and then we break it apart, hopefully, down to an attacker-by-attacker basis.
[15:45.470 --> 15:51.730]  So, in the first stage, there's a number of groups that are starting on working on collecting data
[15:51.730 --> 15:59.070]  from many intrusion detection systems groups, MyNet Watchman, DShield, and Incidents.org.
[15:59.070 --> 16:02.770]  Incidents.org actually collects information from these two and others.
[16:03.010 --> 16:08.170]  And they've been doing some preliminary work, some sort of basic analysis of what's going on.
[16:08.170 --> 16:14.390]  You can go there and see what, over the past month, has been the most attacked port,
[16:14.390 --> 16:18.610]  what IP address has been the source of most attacks, things like that.
[16:19.870 --> 16:25.390]  Certainly, it's useful information, but it's kind of the first-stage approach to this.
[16:25.750 --> 16:31.610]  It starts to give you a bit of an idea of what's going on out there on the networks.
[16:36.700 --> 16:42.300]  Okay, on with a bit about information warfare theory.
[16:42.540 --> 16:43.220]  Whoops.
[16:43.600 --> 16:44.240]  Huh.
[16:44.320 --> 16:46.380]  Or a mistake. Hang on a sec.
[16:59.960 --> 17:09.800]  The key factor in information warfare is the OODA loop that was developed by Colonel John Boyd.
[17:10.760 --> 17:16.260]  These four stages, observe, orient, decide, and act, are kind of, in military terms,
[17:16.260 --> 17:22.980]  are the overarching method of conducting information warfare.
[17:22.980 --> 17:25.300]  And really, it's something that's pretty obvious.
[17:25.300 --> 17:34.420]  It's pretty much how any sort of attack or response or even game sort of situation would be played out.
[17:34.520 --> 17:39.120]  Again, it's just a formalization of things that people do all the time.
[17:39.120 --> 17:43.220]  The first stage is to observe, where you collect data, you collect information,
[17:43.220 --> 17:49.200]  whether this is coming from automated collection sources or from humans giving reports,
[17:49.200 --> 17:52.440]  any sort of sensor data input.
[17:52.440 --> 17:58.520]  And then the orient stage is where we take all of this information that we're gathering and try to make sense out of it.
[17:58.520 --> 18:02.940]  We try to figure out what does it mean, what is it telling us is going on.
[18:02.980 --> 18:07.620]  Based on that, we make decisions about what we're going to do next, and we evaluate these decisions.
[18:07.620 --> 18:12.800]  This is kind of like the playing chess, where you think a few moves ahead.
[18:12.800 --> 18:15.000]  You say that this is what I think they're doing.
[18:15.520 --> 18:18.880]  If I take this response to it, how are they going to react to it?
[18:18.880 --> 18:20.540]  What's likely to be the outcome?
[18:20.540 --> 18:25.800]  We evaluate different decision possibilities and then eventually act upon this and implement it.
[18:30.780 --> 18:36.180]  The sort of automated OODA loop, and I say mostly automated,
[18:36.180 --> 18:39.380]  because there's always room for human judgment to be put in on this,
[18:39.380 --> 18:42.920]  because we don't trust machines and algorithms that much.
[18:44.220 --> 18:48.580]  The loop around the outer edge, if you can see the mouse pointer,
[18:48.580 --> 18:53.580]  is the main OODA loop where we're sensing information,
[18:53.580 --> 18:56.400]  and then this goes through some sort of data fusion system
[18:56.400 --> 19:00.760]  to do the detection and situational awareness development.
[19:00.760 --> 19:03.700]  It moves into a planning stage, the deciding stage,
[19:03.700 --> 19:06.860]  and then some sort of automated response is initiated.
[19:07.220 --> 19:14.520]  This is fed back through a feedback mechanism to possibly alter the sensing mechanisms
[19:14.520 --> 19:17.460]  to tell them to look at different areas, gather different data.
[19:18.360 --> 19:21.880]  Do different analysis, and the loop continues around.
[19:22.340 --> 19:28.240]  In order to update the process so that it's not a static process,
[19:28.240 --> 19:30.760]  there are things going on such as data mining,
[19:30.760 --> 19:33.520]  where we're looking for new patterns in the information
[19:33.520 --> 19:41.620]  to try and identify previously unknown methods or behaviors of attack.
[19:43.020 --> 19:48.080]  And also visualization that is fed into, that is taken out of the system
[19:48.080 --> 19:51.700]  to be given to humans, to the analysts,
[19:51.700 --> 19:54.940]  so that they can make their judgment about the behavior of the system
[19:54.940 --> 19:58.620]  and kind of be there, it's like the check and balance,
[19:58.620 --> 20:04.120]  and to adjust how the control back mechanism works to improve the system.
[20:08.830 --> 20:13.210]  So, the orient stage is often the weakest part of this loop.
[20:13.210 --> 20:18.530]  A lot of technology concentrates on the observed stage, the sensors.
[20:18.530 --> 20:22.850]  And this is both true of offensive and defensive technologies.
[20:22.850 --> 20:25.510]  It's kind of like, back to military terms,
[20:25.510 --> 20:29.610]  it's kind of like, you know, it's your radar systems
[20:29.610 --> 20:33.490]  that's trying to detect things, you know,
[20:33.490 --> 20:36.070]  versus like your stealth fighters trying to avoid detection.
[20:36.070 --> 20:38.650]  Same thing about the intrusion detection systems on your network
[20:38.650 --> 20:40.810]  and the people that are trying to find new ways around them
[20:40.810 --> 20:42.250]  and avoid being detected by them.
[20:43.210 --> 20:45.610]  There's a lot of concentration on that.
[20:46.290 --> 20:50.430]  And as far as the decide stage goes,
[20:50.430 --> 20:53.610]  at least in our case of dealing with intrusion detection and computer security,
[20:53.610 --> 20:56.910]  it's often fairly trivial to making the decision.
[20:56.910 --> 20:58.810]  Once you know what someone's trying to do,
[20:58.810 --> 21:00.610]  it's usually pretty easy to shut them down
[21:00.610 --> 21:04.030]  and stop them from accomplishing their goals.
[21:05.950 --> 21:10.470]  And the act stage is, again, merely the implementation of the decide stage.
[21:11.110 --> 21:13.970]  It's fairly easy to do once the decision has been made,
[21:13.970 --> 21:18.010]  but it can be disastrous if a poor decision has been made.
[21:18.030 --> 21:22.890]  And the decisions, of course, relate right back to the orientation.
[21:33.430 --> 21:36.070]  The knowledge development that takes place in the orient stage,
[21:36.070 --> 21:39.210]  we break down into a few more categories.
[21:39.210 --> 21:44.550]  Data refinement, this is kind of in the traditional information warfare,
[21:44.550 --> 21:45.410]  but it doesn't really apply here.
[21:45.410 --> 21:48.410]  This is kind of removing systemic noise out of the sensors.
[21:48.690 --> 21:53.050]  Object refinement is where we take the information that is being gathered
[21:53.050 --> 21:56.170]  and break it down into correlated events.
[21:56.430 --> 22:02.930]  These sensor readings are related to the same object.
[22:03.450 --> 22:06.570]  And then, once we have it broken down into objects,
[22:06.570 --> 22:11.610]  we then look at the patterns of motion, so to speak, of these objects
[22:11.610 --> 22:16.210]  and how they're behaving as a group, which is a situational refinement.
[22:16.430 --> 22:21.510]  And once you get to that stage, then you ask, why are they doing this?
[22:21.510 --> 22:24.730]  And what's the meaning behind this? What are they trying to accomplish?
[22:25.170 --> 22:29.910]  And finally, the process refinement is actually adjusting the information gathering
[22:29.910 --> 22:35.110]  and data fusion process to have the feedback.
[22:44.000 --> 22:47.060]  The question was, is the orient the most difficult part of the loop
[22:47.060 --> 22:52.600]  because it's the hardest or that we don't know much about it or whatever?
[22:53.040 --> 22:55.780]  And it's actually a bit about all of that.
[22:55.780 --> 23:01.960]  It's certainly, I think, the hardest because a lot of it involves
[23:01.960 --> 23:05.480]  not just dealing with the programs that you're writing or the systems
[23:05.480 --> 23:07.420]  that you're developing, like the IDS,
[23:07.420 --> 23:11.100]  but you have to know a bit about how people are going to behave
[23:11.100 --> 23:14.660]  and how different attacks are going to behave.
[23:14.800 --> 23:18.880]  So, I think it's certainly the most complicated
[23:18.880 --> 23:23.320]  and it's the one that the least amount of work has been done in.
[23:29.300 --> 23:35.220]  So, what we're using in this orient stage to try and figure out
[23:36.720 --> 23:41.460]  what the related events are is a method called Bayesian Multiple Hypothesis Tracking,
[23:41.460 --> 23:46.440]  which comes out of classical radar tracking theory.
[23:46.600 --> 23:52.280]  In this, we have the components that you have for your environment
[23:52.280 --> 23:56.140]  that this is taking place within, the sensors that are out there
[23:56.140 --> 23:58.980]  that are measuring, taking measurements on the environment,
[23:58.980 --> 24:01.700]  and the targets that are moving through your system.
[24:02.900 --> 24:05.720]  The environment, like I said, it's your state space.
[24:06.720 --> 24:10.400]  It also places constraints on the motion of the objects within it.
[24:11.060 --> 24:15.800]  The sensors are the ones that take a measurement of the environment,
[24:15.800 --> 24:20.620]  whether it's position or in our case, you know, informing reports about
[24:20.620 --> 24:24.760]  coming off the intrusion detection systems about a particular attack
[24:24.760 --> 24:27.280]  being used at a particular area at a time.
[24:27.360 --> 24:30.400]  And the targets are what you're trying to attack
[24:30.980 --> 24:34.520]  and the knowledge we have to develop about them are behavioral models.
[24:36.640 --> 24:42.560]  So, this is just kind of like an example out of more to the classical tracking theory,
[24:42.560 --> 24:46.100]  like in a radar tracking, if we have some sort of targets that are moving
[24:46.100 --> 24:50.280]  in a two-dimensional space, and this could be, you know, like x, y,
[24:50.280 --> 24:54.540]  you know, latitude, longitude, if it's something in the physical world,
[24:54.540 --> 24:57.960]  or it could be any two dimensions that we're analyzing,
[24:57.960 --> 25:04.500]  where in this graph, these blips on the radar, so to speak,
[25:04.500 --> 25:10.020]  come in in pairs, where you'd have this one and this one at one sweep of the radar
[25:10.020 --> 25:14.740]  and then this one and this one come in later in the future.
[25:14.740 --> 25:17.480]  And we're trying to make sense out of what's going on here.
[25:17.480 --> 25:20.040]  What do we see happening over time?
[25:20.540 --> 25:28.200]  And in the hypothesis generation step of the Bayesian multiple hypothesis tracking,
[25:28.200 --> 25:33.120]  we build all possible hypotheses of what is happening.
[25:33.120 --> 25:35.700]  And these show two different possible tracks.
[25:35.820 --> 25:39.560]  These both say that we've got two targets out there,
[25:39.560 --> 25:42.620]  and they could be two targets that are moving,
[25:42.620 --> 25:45.020]  that kind of come in and move apart from each other in the top one,
[25:45.020 --> 25:47.400]  or cross over each other in the bottom one.
[25:47.700 --> 25:53.060]  And we build these tracks, sets of tracks that are hypotheses,
[25:54.360 --> 25:58.200]  that show all the possibilities of what is going on out there.
[25:58.200 --> 26:02.160]  And then these are then evaluated for how likely each one is.
[26:05.240 --> 26:10.240]  In this view, each of the scans occurs at a different time,
[26:10.240 --> 26:14.340]  and the little circles and dots inside each scan are the readings that we're getting.
[26:14.420 --> 26:19.620]  And this is showing three scans with one possible hypothesis,
[26:19.620 --> 26:22.780]  where we're showing how target one was seen.
[26:22.780 --> 26:25.420]  We assume that these three things are related to target one.
[26:25.420 --> 26:27.260]  They're seen in all three scans.
[26:28.100 --> 26:31.020]  Target two is seen only in the first two.
[26:31.020 --> 26:33.040]  Target three is seen in the first and third, and so on.
[26:33.040 --> 26:34.880]  There's some false alarms in there,
[26:36.200 --> 26:38.540]  and other things like that going on.
[26:38.560 --> 26:40.040]  And we construct all...
[26:40.040 --> 26:43.660]  Again, this is where we construct all possible ones of these.
[26:43.660 --> 26:45.440]  Well, in the brute force method,
[26:45.440 --> 26:49.060]  there's some optimizations to get away from having to do that,
[26:49.060 --> 26:51.620]  to make it more computationally feasible.
[26:51.620 --> 26:55.180]  And then we evaluate each of these hypotheses.
[26:55.860 --> 27:01.320]  So, in the evaluation procedure,
[27:01.320 --> 27:03.000]  there are kind of two stages.
[27:03.000 --> 27:10.200]  One is the likelihood of the system state generating the reports that you're seeing by your IDS,
[27:10.200 --> 27:15.640]  and the other is the probability of the target actually moving from one state to another.
[27:16.460 --> 27:19.160]  The future scans that come in will...
[27:19.160 --> 27:23.580]  So we build what our belief is in every hypothesis,
[27:23.580 --> 27:26.000]  and as more information comes into the system,
[27:26.000 --> 27:29.680]  the belief in a particular hypothesis is either strengthened or weakened,
[27:29.680 --> 27:33.060]  until eventually in a convergent system,
[27:33.060 --> 27:36.500]  one or a set of a few hypotheses will emerge
[27:36.500 --> 27:40.300]  as the most likely candidates for what's going on in the real world.
[27:41.240 --> 27:45.020]  So, the likelihood evaluation is a critical part of this,
[27:45.020 --> 27:50.340]  and this is the sort of the Bayesian approach to probability,
[27:50.340 --> 27:53.480]  where the forward behavior of the sensor is well understood,
[27:53.480 --> 27:57.780]  meaning that if you give me a real world event,
[27:57.780 --> 28:01.600]  I can tell you what my intrusion detection system is going to... how it's going to respond to that,
[28:01.600 --> 28:04.680]  that if you see this particular set of packets coming in,
[28:04.680 --> 28:08.920]  we know that this IDS is going to trigger a flag and say it's this sort of attack.
[28:09.040 --> 28:14.240]  What you don't know, and what the problem with a lot of IDSs are,
[28:14.240 --> 28:16.800]  is knowing backwards, because this isn't...
[28:16.800 --> 28:20.060]  you don't know what attack is coming in ahead of...
[28:20.060 --> 28:21.940]  you don't know what the attack is.
[28:21.940 --> 28:25.360]  You see the report, and you're trying to figure out what caused it,
[28:25.360 --> 28:28.940]  and we don't know whether...
[28:29.940 --> 28:32.700]  while we know a particular attack will trigger the IDS,
[28:32.700 --> 28:36.440]  if the IDS is triggered, we don't necessarily know it was specifically that attack.
[28:36.440 --> 28:39.960]  This could be some sort of... this could be a false positive reading.
[28:40.000 --> 28:53.960]  The question was, do we have profiles of certain types of attacks?
[28:53.960 --> 28:58.500]  And this is one of the areas of research.
[28:58.500 --> 29:02.580]  One of the things that we are trying to build up is a sort of a database.
[29:02.580 --> 29:06.400]  It's similar to existing vulnerability databases that are out there,
[29:06.400 --> 29:11.940]  but tying in particular profiles of what the attacks look like and what the behavior does look like.
[29:11.940 --> 29:14.440]  And yeah, that's a critical part to this.
[29:19.400 --> 29:24.780]  This equation is the basis of this sort of likelihood tracking,
[29:24.780 --> 29:34.840]  where over on the left, this is showing the probability of the system existing in state S at a certain time.
[29:34.840 --> 29:41.380]  And it's based on the likelihood of the sensors giving you the readings that you're seeing,
[29:41.380 --> 29:43.980]  if we truly are in this state,
[29:43.980 --> 29:52.000]  and the P minus is the likelihood of the target having moved from its previous state in the hypothesis to its current state.
[29:53.000 --> 30:03.440]  So the key feature to this sort of analysis is that it breaks apart the target's motion
[30:03.440 --> 30:06.460]  and separates it from the sensor's behavior,
[30:06.460 --> 30:09.620]  so we can model each of those independently,
[30:09.620 --> 30:13.060]  and then bring them together through this tracking method
[30:13.060 --> 30:21.280]  to try and figure out the likelihood of a particular hypothesis
[30:21.280 --> 30:24.400]  being the actual real world events.
[30:25.140 --> 30:28.380]  Just skip over all the equations.
[30:29.840 --> 30:36.780]  So one of the big problems with this sort of method is it requires a lot of computation,
[30:36.840 --> 30:39.660]  a lot of effort to do this.
[30:39.800 --> 30:41.980]  There's a lot of information coming in,
[30:41.980 --> 30:45.480]  building all the possible hypotheses is very computationally expensive,
[30:45.900 --> 30:49.400]  but there's ways of kind of reducing this.
[30:49.400 --> 31:00.010]  Depending on what type of sensors we have out there,
[31:00.730 --> 31:03.110]  sensors fall into two kind of general categories,
[31:03.110 --> 31:05.270]  whether they're complementary or competitive,
[31:05.270 --> 31:09.810]  meaning whether you have two sensors that are looking at different areas or different aspects of it
[31:09.810 --> 31:12.090]  to where they never disagree with each other,
[31:12.090 --> 31:14.050]  because they're looking at different areas of your network,
[31:14.050 --> 31:19.770]  or like an IDS on one network and the same IDS over on another network or complementary,
[31:19.770 --> 31:23.710]  as opposed to two different types of IDS on the same network are competitive
[31:23.710 --> 31:25.970]  because they might disagree with each other,
[31:25.970 --> 31:32.270]  and then you have to take the further step of trying to figure out which one of those you believe,
[31:32.270 --> 31:35.690]  or some way of combining that information.
[31:36.590 --> 31:39.710]  The target, the behavioral models of the targets are,
[31:39.710 --> 31:44.190]  we are trying to model them with Markov processes,
[31:44.190 --> 31:47.690]  which allows for independence from past states.
[31:47.690 --> 31:51.790]  That means that the probability of the target moving to any future state
[31:51.790 --> 31:55.930]  is based only on its current state, not based on its past history.
[31:56.710 --> 32:01.850]  And while that seems to be a...
[32:01.850 --> 32:05.850]  that doesn't seem to quite make sense when you first hear it like that,
[32:05.850 --> 32:09.330]  there are methods of building Markovian processes
[32:09.330 --> 32:13.850]  that fairly well describe a lot of attacks in this method.
[32:15.910 --> 32:19.630]  The Bayes theorem has a recursive nature to it,
[32:19.630 --> 32:22.790]  to where every stage, as more information comes in,
[32:22.790 --> 32:26.710]  we don't have to go back to the beginning and recalculate everything from the start.
[32:26.710 --> 32:29.390]  It can be added in one stage at a time.
[32:29.490 --> 32:33.030]  It's kind of like... it's similar to how, you know, if you're doing factorials,
[32:33.030 --> 32:38.410]  if you know what 5 factorial is, calculating 6 factorial only takes one more step.
[32:38.410 --> 32:40.930]  It's a similar process with this.
[32:41.690 --> 32:48.570]  And one of the keys to complexity reduction is using hypothesis pruning techniques,
[32:48.570 --> 32:51.470]  where we're going to figure out from one stage to the next
[32:51.470 --> 32:53.930]  which hypotheses are we going to progress forward,
[32:53.930 --> 32:55.850]  which ones are we going to get rid of.
[32:55.990 --> 33:00.270]  And there's a number of methods for doing this,
[33:00.270 --> 33:03.170]  whether you keep, like, just your top few of them
[33:03.170 --> 33:08.110]  or do some sort of thing that's similar to the way genetic algorithms work,
[33:08.110 --> 33:13.930]  where we keep a certain population and sort of add some randomness to it
[33:13.930 --> 33:18.370]  to make sure that we haven't sort of approached a local minima
[33:18.370 --> 33:22.050]  and are truly looking for the global correct solution.
[33:22.410 --> 33:26.550]  And also minimizing the dimensionality of the state space,
[33:26.550 --> 33:30.350]  how many variables we're looking at coming off of the IDS.
[33:30.350 --> 33:39.540]  So, the next trick is applying this to distributed intrusion detection systems.
[33:42.580 --> 33:47.660]  Intrusion detection systems fall into kind of four basic categories
[33:47.660 --> 33:49.620]  based on their particular method,
[33:49.620 --> 33:53.600]  whether it's a signature-based type system
[33:55.340 --> 34:00.000]  or a statistical anomaly detection system,
[34:00.000 --> 34:03.320]  and their placement, whether it's a host-based system or network-based system,
[34:03.320 --> 34:06.700]  will affect, of course, what it's going to see
[34:06.700 --> 34:08.800]  and the view of the network it's going to have.
[34:10.740 --> 34:14.080]  For our purposes, we model these as our sensors,
[34:14.640 --> 34:17.260]  where the statistical anomaly systems,
[34:17.260 --> 34:20.400]  they have, of course, a very high false positive rate.
[34:20.400 --> 34:25.660]  They have a lower miss rate, particularly for kind of new unknown attacks,
[34:25.660 --> 34:28.280]  because they're not concentrating on known patterns,
[34:28.280 --> 34:31.080]  but rather looking for unusual events happening on your network
[34:31.080 --> 34:34.740]  with the assumption that most bad things are going to look unusual.
[34:36.560 --> 34:40.360]  Signature detection systems have a high miss rate for unknown attacks.
[34:40.360 --> 34:43.340]  In fact, if they don't know about the attack, don't have a signature for it,
[34:43.340 --> 34:44.900]  they're not going to see it.
[34:45.240 --> 34:48.620]  But they have a considerably lower false positive rate
[34:48.620 --> 34:50.820]  than the statistical anomaly systems.
[34:50.940 --> 34:53.700]  And then, of course, host-based see the activity occurring on a single host
[34:53.700 --> 34:59.100]  and network-based sees your traffic on that particular section of your network.
[35:01.820 --> 35:06.000]  We also need a communication framework to gather data from these systems
[35:06.000 --> 35:12.060]  and to allow it to be interoperable with the system as a whole.
[35:12.720 --> 35:16.560]  There are three requirements for this sort of communication system.
[35:16.560 --> 35:18.200]  One is configuration interoperability,
[35:18.200 --> 35:20.880]  meaning that they just have to be able to talk to each other.
[35:20.880 --> 35:24.500]  The semantic interoperability, meaning that they can parse each other's data.
[35:24.540 --> 35:27.400]  And finally, the hardest, the intercomprehension,
[35:27.400 --> 35:33.460]  meaning that they agree on the meaning and definition of the data descriptors.
[35:33.460 --> 35:37.780]  That when something says, this is item type A,
[35:37.780 --> 35:40.980]  that everybody else knows what item type A means.
[35:42.200 --> 35:49.980]  There's some work being done in these areas.
[35:49.980 --> 35:54.780]  One of these is kind of out of date and I'm not sure how effective.
[35:54.780 --> 35:57.820]  The CIDF is the Common Intrusion Detection Framework
[35:57.820 --> 36:02.960]  and the IDEF is the Intrusion Detection Exchange Format.
[36:02.960 --> 36:10.040]  A couple methods for communication in a common language for intrusion detection systems.
[36:11.740 --> 36:16.480]  When we apply the multiple hypothesis tracking to intrusion detection,
[36:16.480 --> 36:20.000]  we have our components, our sensors are the IDS,
[36:20.000 --> 36:23.440]  the honeypot systems out there, human inputs,
[36:23.440 --> 36:28.560]  anything else that's gathering information about what's happening on various networks.
[36:28.560 --> 36:30.500]  The targets, of course, are the attackers
[36:30.500 --> 36:33.760]  that we're trying to monitor and figure out what they're doing.
[36:33.840 --> 36:40.300]  And the state space is our description of the current state of the attacker or the attack,
[36:40.300 --> 36:41.980]  what is going on.
[36:43.520 --> 36:52.280]  And environmental knowledge such as either the structure of your network
[36:52.280 --> 36:58.140]  or other things that may limit or influence the behavior and decisions made by the attacker.
[37:00.500 --> 37:03.180]  As opposed to traditional tracking systems,
[37:03.180 --> 37:07.640]  IDS sensors are a little unusual in that their measurements,
[37:07.640 --> 37:09.260]  when they make them, are highly accurate.
[37:09.260 --> 37:11.800]  They may be correct or incorrect,
[37:11.800 --> 37:17.340]  but they don't have the variableness of a radar sensor
[37:17.340 --> 37:19.960]  that tells you something's in this general area.
[37:19.960 --> 37:22.480]  An IDS will tell you, this is what I'm seeing,
[37:22.480 --> 37:25.200]  and it might be absolutely wrong about what it sees,
[37:25.200 --> 37:28.020]  but it will tell you a very definite answer.
[37:30.020 --> 37:33.580]  They also have a considerably higher false positive rate,
[37:33.580 --> 37:35.900]  particularly with the anomaly detection systems.
[37:40.520 --> 37:43.820]  So, it's important to know how we describe an attack.
[37:43.820 --> 37:48.160]  And this gets back to the modeling of the various types of attacks
[37:49.180 --> 37:53.460]  and building the database of what these attacks are going to look like.
[37:54.300 --> 37:57.060]  There's the raw information that we see,
[37:57.060 --> 38:01.440]  the usual stuff like the time, the IP address, the ports, things like that.
[38:01.840 --> 38:05.740]  But what's important to us is a bit more of a descriptive
[38:09.720 --> 38:11.820]  abstracted information about the attack.
[38:11.900 --> 38:15.100]  Things that would give us a bit of insight
[38:15.100 --> 38:18.400]  into why someone or a particular attack might be using this.
[38:18.400 --> 38:21.400]  Whether they're just going for the easy, quick attacks,
[38:21.400 --> 38:24.620]  whether they're doing things in a slow, stealthy, calculated way,
[38:24.620 --> 38:27.200]  whether it looks like they've done reconnaissance ahead of time
[38:27.200 --> 38:33.180]  and know a bit about the network and other things like that.
[38:35.000 --> 38:37.080]  These abstractions must be very accurate
[38:37.080 --> 38:39.500]  because otherwise the system will fall apart.
[38:39.500 --> 38:42.400]  If the abstractions are incorrect,
[38:42.400 --> 38:45.400]  you're going to make incorrect assumptions about what's happening.
[38:46.000 --> 38:50.540]  And this is where we're using what I mentioned a little bit ago
[38:50.540 --> 38:51.920]  about the vulnerabilities database
[38:51.920 --> 38:54.320]  to gather some of this information
[38:54.320 --> 38:59.460]  and build a sort of history of the models of these attacks.
[39:00.980 --> 39:04.820]  And again, while that's kind of the key to the system,
[39:04.820 --> 39:08.000]  the raw data also is, of course,
[39:08.000 --> 39:10.700]  very useful information to keep as well.
[39:12.280 --> 39:15.460]  So, the vulnerabilities database,
[39:15.460 --> 39:17.420]  this is similar to other things that are out there
[39:17.420 --> 39:18.920]  like the ICAT metabase
[39:18.920 --> 39:22.020]  and the various things like the CERD and CVEs
[39:22.020 --> 39:24.740]  and all of that bug track.
[39:24.900 --> 39:26.640]  And actually it ties into those
[39:26.640 --> 39:28.200]  and gets its information from those,
[39:28.200 --> 39:31.180]  but it's designed such that it's extensible
[39:31.180 --> 39:33.820]  with other information that we want to add in
[39:33.820 --> 39:36.640]  about the particular exploits and vulnerabilities
[39:36.640 --> 39:39.200]  so that we can build up
[39:39.200 --> 39:43.620]  these sort of more descriptive definitions of what's happening.
[39:46.120 --> 39:48.680]  I guess there's not too much more to say about that.
[39:48.680 --> 39:50.160]  It's kind of like just our own...
[39:50.160 --> 39:52.860]  We were eventually planning on kind of releasing this
[39:52.860 --> 39:55.060]  to allow public access to it,
[39:55.060 --> 39:59.360]  but right now it's still kind of our own private research tool.
[40:01.940 --> 40:05.760]  Just real briefly about modeling the attacker behavior.
[40:05.760 --> 40:07.520]  The attacker's behavior is dependent on their goals,
[40:07.520 --> 40:13.360]  whether they're trying to do the DDoS preparation
[40:14.020 --> 40:15.300]  where they're going out gathering zombies
[40:15.300 --> 40:18.960]  or that they're going after a particular single target.
[40:18.960 --> 40:21.140]  And I decided to make this current
[40:21.140 --> 40:22.660]  because I just heard yesterday
[40:22.660 --> 40:26.900]  that sans.org got hit yesterday morning.
[40:26.900 --> 40:30.160]  So that's obviously what someone was after then.
[40:33.560 --> 40:37.020]  I heard that Fluffy Bunny gets credit for that.
[40:38.800 --> 40:41.320]  Again, it's desirable to use Markovian models
[40:41.320 --> 40:45.040]  to model the attacker behavior.
[40:45.580 --> 40:47.220]  And it's possible that a particular attacker
[40:47.220 --> 40:48.980]  might be exhibiting multiple behaviors
[40:48.980 --> 40:50.440]  depending on what they're trying to do
[40:50.440 --> 40:54.600]  or that the behaviors can be described as basic components
[40:54.600 --> 40:56.600]  that can be combined into...
[40:57.220 --> 41:00.160]  the whole theory of starting off with your simple basic components
[41:00.160 --> 41:01.980]  that you combine in complex ways
[41:01.980 --> 41:06.720]  rather than designing things that are complex from the start.
[41:10.440 --> 41:14.080]  As a simplified version of a behavioral model,
[41:14.080 --> 41:18.740]  if we just look at two components, two dimensions,
[41:18.740 --> 41:23.920]  whether it's the attack method and the IP address space,
[41:23.920 --> 41:27.520]  if you could imagine if these two things weren't colored differently
[41:27.520 --> 41:31.660]  and you just saw this sort of blob of information coming in,
[41:31.660 --> 41:33.500]  that one of the hypotheses you could generate
[41:33.500 --> 41:36.400]  is that these were actually two different things going on here.
[41:36.400 --> 41:38.160]  There was one particular attack
[41:38.160 --> 41:41.420]  that's concentrating on a particular system on a local area,
[41:41.420 --> 41:44.860]  trying hard to get into it using many different methods
[41:44.860 --> 41:48.120]  of getting into that system or getting information about it,
[41:48.120 --> 41:53.260]  whereas the other one was using one particular type of attack
[41:53.260 --> 41:55.620]  but using it over a wide range of systems.
[41:55.720 --> 41:57.880]  These two things overlap in a certain area,
[41:57.880 --> 42:03.240]  but it's possible to differentiate the two, one from the other.
[42:05.860 --> 42:11.960]  So, in order to develop these behavioral models,
[42:11.960 --> 42:15.200]  we use some automated tools, some data mining things.
[42:15.220 --> 42:17.260]  There's a couple other groups that are working on this.
[42:17.260 --> 42:20.720]  I'm not sure of the current state of some of these things,
[42:20.720 --> 42:28.700]  but I know Winky Lee down in North Carolina
[42:28.700 --> 42:30.860]  working on a system, Madam ID,
[42:30.860 --> 42:38.380]  that is an automated behavioral model generation system.
[42:41.460 --> 42:44.680]  And also, actually I just realized
[42:44.680 --> 42:46.280]  I got a little confused by my own slide
[42:46.280 --> 42:48.320]  and was jumping to something different.
[42:48.320 --> 42:51.060]  Really what I was trying to say here was actually that
[42:51.060 --> 42:54.400]  when someone uses automated tools to attack a system,
[42:54.920 --> 42:58.940]  that it's much easier to build a behavioral model for that.
[42:58.940 --> 43:00.520]  One of the first areas we started in
[43:00.520 --> 43:03.060]  was designing models for worms,
[43:03.060 --> 43:04.440]  because a worm is basically,
[43:04.920 --> 43:07.680]  it's like a script kitty without the kitty.
[43:07.680 --> 43:12.220]  It's kind of the ultimate little automated script kitty.
[43:13.000 --> 43:15.980]  So it's very easy to know what the worm is going to do,
[43:15.980 --> 43:16.760]  how it's going to behave,
[43:16.760 --> 43:18.960]  and recognize this behavior in the future.
[43:20.580 --> 43:23.360]  A colleague of mine that works at ISTS, Bill Stearns,
[43:23.360 --> 43:26.380]  he recently, when we had the lion and ramen
[43:26.380 --> 43:28.440]  and all of that stuff going on,
[43:28.440 --> 43:30.520]  he was one of the ones working on the tools
[43:30.520 --> 43:33.520]  to go in and clean up the systems
[43:33.520 --> 43:36.480]  and try and figure out if these things were there
[43:36.480 --> 43:37.640]  and then patch it.
[43:37.640 --> 43:39.440]  And every time, it's the usual process,
[43:39.440 --> 43:41.240]  every time he would write one of these tools,
[43:41.240 --> 43:42.100]  he would put it out there
[43:42.100 --> 43:43.460]  and someone would modify the worm
[43:43.460 --> 43:45.000]  so it acted a bit differently.
[43:45.000 --> 43:51.040]  So while using the system that we're developing here
[43:52.040 --> 43:54.280]  doesn't help you actually go in and clean out
[43:54.280 --> 43:57.120]  because it can't know things at that low level of detail,
[43:57.120 --> 43:59.880]  it's easy to build a model that would actually,
[43:59.880 --> 44:02.260]  that would detect and recognize the presence
[44:02.260 --> 44:04.040]  of all the different types of worms
[44:04.040 --> 44:07.340]  and make it much harder for someone to modify the worm
[44:07.340 --> 44:10.020]  in such a way that it avoided detection.
[44:10.460 --> 44:11.780]  You still have the problem, of course,
[44:11.780 --> 44:13.680]  of going in and having to clean it up.
[44:14.760 --> 44:16.560]  Manual attacks, where you've got somebody
[44:16.560 --> 44:18.240]  just sat down behind their computer
[44:18.240 --> 44:20.240]  deciding what they're going to do at every stage,
[44:20.240 --> 44:22.300]  not doing anything automatically,
[44:22.300 --> 44:24.720]  is, of course, much more harder to predict.
[44:24.900 --> 44:27.320]  The particular approach we're using towards this
[44:27.320 --> 44:28.920]  is a game theory approach
[44:28.920 --> 44:30.860]  where we treat it like a game
[44:30.860 --> 44:32.860]  between the attacker and the security system
[44:32.860 --> 44:37.660]  and assuming that the attacker moves with intent
[44:37.660 --> 44:39.460]  and moves rationally,
[44:39.460 --> 44:41.280]  figure out what's most likely,
[44:41.280 --> 44:42.980]  what the course of action they're most likely
[44:42.980 --> 44:45.540]  going to take to maximize their gain
[44:45.540 --> 44:47.140]  and minimize their risk.
[44:49.060 --> 44:51.320]  Like I said, attacker versus security systems.
[44:51.320 --> 44:52.700]  Both sides have imperfect knowledge.
[44:52.700 --> 44:55.680]  They don't know exactly what the other side is doing.
[44:55.680 --> 44:56.920]  They can repeat the attempts
[44:56.920 --> 45:01.540]  and they gain knowledge as the game goes on.
[45:09.020 --> 45:11.260]  Without getting too much more into the game theory,
[45:11.260 --> 45:14.220]  right now it's not that developed
[45:14.220 --> 45:17.200]  and it's not all that interesting at this stage.
[45:17.460 --> 45:20.080]  That's about the end of my presentation
[45:20.080 --> 45:23.140]  on the distributed intrusion detection stuff.
[45:23.140 --> 45:24.880]  I'm just going to put in a few plugs
[45:24.880 --> 45:26.540]  for some of the guys I work with
[45:27.100 --> 45:28.840]  with their research.
[45:29.200 --> 45:31.940]  The quantitative security risk assessment.
[45:31.940 --> 45:34.700]  This is kind of a formalization of methods
[45:34.700 --> 45:38.920]  to analyze the security risks on your network
[45:38.920 --> 45:41.200]  and determine where you can best spend your money
[45:41.200 --> 45:43.480]  to improve it, get the most bang for your buck.
[45:45.240 --> 45:47.160]  Especially if you have limited resources,
[45:47.160 --> 45:49.840]  monetary or otherwise, human resources, whatever.
[45:50.160 --> 45:52.580]  And the other area of interest in this
[45:52.580 --> 45:54.500]  is in insurance for networks.
[45:54.620 --> 45:57.900]  As far as I know, and this isn't my work,
[45:57.900 --> 45:59.680]  so I'm not the one to know.
[45:59.680 --> 46:02.860]  Currently, Lloyd's of London has actually started
[46:03.500 --> 46:05.400]  providing insurance policies for networks
[46:05.400 --> 46:06.760]  against attacks.
[46:06.760 --> 46:08.900]  And so far, they're the only insurance company
[46:08.900 --> 46:11.040]  to be doing this.
[46:11.420 --> 46:13.640]  And they know what they're doing,
[46:13.640 --> 46:16.200]  so they must be using methods similar to this.
[46:17.240 --> 46:18.600]  The forensic tool development
[46:18.600 --> 46:20.580]  is another big thing going on there.
[46:21.400 --> 46:23.640]  Just a couple examples of what they're doing.
[46:23.640 --> 46:27.300]  One is doing digital camera identification.
[46:27.300 --> 46:30.340]  This is built on a tool called NCASE
[46:30.340 --> 46:31.620]  that's used by law enforcement
[46:31.620 --> 46:35.100]  to analyze files that are found
[46:35.100 --> 46:39.100]  on a computer that's taken as evidence
[46:39.100 --> 46:40.880]  in a particular case.
[46:41.360 --> 46:43.360]  And amazingly, when you take pictures
[46:43.360 --> 46:44.480]  with digital cameras,
[46:44.480 --> 46:46.120]  every different camera puts in
[46:46.120 --> 46:47.820]  different sort of metadata
[46:47.820 --> 46:50.000]  into the file that it creates
[46:50.000 --> 46:53.140]  and can actually, depending on the particular
[46:53.140 --> 46:55.320]  brand of camera and that,
[46:55.320 --> 46:57.580]  it can actually tie it right back
[46:57.580 --> 47:00.320]  to a particular camera with a particular serial number.
[47:00.400 --> 47:02.920]  And this is being used now.
[47:02.920 --> 47:04.940]  This is actually starting to be used now.
[47:04.940 --> 47:06.600]  It's one of our first products
[47:06.600 --> 47:09.500]  actually in use by various local law enforcement agencies
[47:09.500 --> 47:14.400]  in the pursuit of child porn cases
[47:14.720 --> 47:18.720]  to where if they find the images on a computer
[47:18.720 --> 47:20.660]  and then they can also tie that
[47:20.660 --> 47:23.560]  to a camera that is owned by that person
[47:23.560 --> 47:26.820]  that they know they've got a stronger case against them.
[47:26.820 --> 47:28.920]  Also, they're working on data recovery tools
[47:28.920 --> 47:31.780]  that do reconstruction of partially deleted files
[47:31.780 --> 47:33.040]  where you've lost part of your file
[47:33.040 --> 47:36.080]  and it tries to recreate either the JPEG
[47:36.080 --> 47:37.440]  or the document from it
[47:37.440 --> 47:39.880]  or whatever type of file it is.
[47:41.120 --> 47:42.520]  Like I said, other projects,
[47:42.520 --> 47:45.000]  semantic hacking, internet health monitoring,
[47:45.000 --> 47:47.000]  large scale network simulation.
[47:47.000 --> 47:49.220]  That's not in the group that I'm working in
[47:49.220 --> 47:50.360]  but in the group upstairs.
[47:50.360 --> 47:53.720]  They're doing simulations of computer networks
[47:53.720 --> 47:59.580]  consisting of on the order of
[47:59.580 --> 48:01.320]  tens of thousands to hundreds of thousands
[48:01.320 --> 48:05.020]  of computer nodes in the simulation.
[48:05.020 --> 48:07.500]  Not computer nodes being used to run the simulation
[48:07.500 --> 48:10.200]  but computers in the simulated network.
[48:10.680 --> 48:11.960]  And the security informant
[48:11.960 --> 48:15.360]  in the user mode Linux HoneyNet project.
[48:16.640 --> 48:19.420]  Basically, information about all of that stuff.
[48:19.420 --> 48:20.200]  If anybody is interested,
[48:20.200 --> 48:21.280]  you can find it at the website
[48:22.540 --> 48:24.460]  ists.dartmouth.edu.
[48:24.460 --> 48:29.520]  The group I work for is the IRIA group.
[48:29.520 --> 48:30.540]  It's off that website too.
[48:30.540 --> 48:32.640]  And that's my email address if anybody is interested.
[48:33.840 --> 48:35.380]  That's about it for my presentation.
[48:35.380 --> 48:38.260]  If anybody has any questions...
[48:44.240 --> 48:45.340]  The question was,
[48:45.340 --> 48:47.440]  how much time do we have invested in the database?
[48:50.240 --> 48:52.340]  Not a whole lot.
[48:53.200 --> 48:54.140]  It was kind of...
[48:54.140 --> 48:55.520]  It's been right now...
[48:55.520 --> 48:58.600]  This is why it's just our own little internal thing right now.
[48:58.600 --> 49:00.240]  This is kind of quick and dirty.
[49:02.140 --> 49:03.700]  We've got a good model behind it
[49:03.700 --> 49:05.280]  as far as it being extensible.
[49:05.600 --> 49:08.580]  But it's kind of a...
[49:08.580 --> 49:10.080]  It's a lot of manual work
[49:10.080 --> 49:11.800]  to get things in and out of it right now.
[49:11.800 --> 49:13.900]  But it's good for our purposes.
[49:15.460 --> 49:16.640]  Out of curiosity,
[49:16.640 --> 49:17.900]  why were you asking?
[49:17.900 --> 49:19.700]  Any particular...
[49:29.560 --> 49:30.360]  Okay.
[49:31.900 --> 49:33.280]  When's your talk?
[49:34.320 --> 49:35.040]  Okay.
[49:35.040 --> 49:36.520]  Yeah, I'd love to hear that.
[49:46.230 --> 49:47.350]  The question was,
[49:47.350 --> 49:49.390]  based on how long it takes to do these computations,
[49:49.390 --> 49:52.290]  how long should we keep forensic evidence available?
[49:52.590 --> 49:55.070]  It's a hard question to answer.
[49:55.070 --> 49:57.870]  It depends on a lot of factors.
[49:57.870 --> 50:01.650]  One, on the type of attack.
[50:01.730 --> 50:04.930]  Some attacks are very spread out.
[50:07.010 --> 50:09.850]  Some take place over a short amount of time.
[50:10.950 --> 50:13.750]  What we're hoping to do
[50:13.750 --> 50:18.290]  is actually model this based on different types of attacks,
[50:18.290 --> 50:19.770]  different types of things you're looking for,
[50:19.770 --> 50:22.250]  to come out with some sort of chart
[50:22.250 --> 50:24.290]  of how long you need to keep stuff.
[50:24.310 --> 50:26.030]  Most things we're looking at
[50:26.030 --> 50:28.470]  are actually on a fairly short time scale,
[50:28.470 --> 50:30.850]  like along a week,
[50:30.850 --> 50:33.570]  to a number, a few weeks.
[50:34.250 --> 50:36.890]  Beyond that, it gets very expensive
[50:36.890 --> 50:40.190]  to maintain the storage systems.
[50:41.410 --> 50:42.390]  Yes?
[50:54.800 --> 50:56.920]  I'm sorry, I didn't quite hear that.
[51:03.380 --> 51:04.740]  Oh, the question was,
[51:04.740 --> 51:08.680]  how far behind do I think that commercial data,
[51:08.680 --> 51:10.940]  IDS vendors and such,
[51:10.940 --> 51:15.080]  and ISPs are behind providing a product like this?
[51:15.860 --> 51:17.600]  I think the biggest challenge
[51:17.600 --> 51:20.700]  to getting something like this out there
[51:20.700 --> 51:24.340]  is actually more political than anything else
[51:24.340 --> 51:27.160]  in that getting people to share the information.
[51:27.180 --> 51:30.400]  People aren't really that willing to do this right now.
[51:30.400 --> 51:33.100]  And we're working with various government agencies
[51:33.100 --> 51:36.260]  where they can be ordered to do so.
[51:38.520 --> 51:40.760]  This is a ways off.
[51:42.720 --> 51:45.560]  We're not expecting anything coming out of this
[51:45.560 --> 51:48.220]  for a number of years, probably.
[51:49.320 --> 51:50.760]  On the left?
[52:09.740 --> 52:11.300]  The question was,
[52:11.300 --> 52:13.440]  are we using any particular methods
[52:13.440 --> 52:15.460]  for either windowing the data,
[52:15.460 --> 52:17.840]  looking at it in different segments of time?
[52:18.280 --> 52:21.920]  Yes, this is actually one of the big areas of research.
[52:21.920 --> 52:24.440]  It's trying to figure out...
[52:25.220 --> 52:26.760]  We haven't done so much in this,
[52:26.760 --> 52:29.580]  but it's one of the things we know we need to work on.
[52:29.580 --> 52:31.440]  It's looking at combining the information
[52:32.120 --> 52:34.600]  in different segments over time.
[52:34.600 --> 52:37.600]  One thing we've been doing is relaxing time constraints
[52:37.600 --> 52:39.280]  to look at things and say,
[52:39.280 --> 52:42.740]  all of these things are close enough together.
[52:42.740 --> 52:45.480]  We'll just throw this as a chunk, throw this as a chunk.
[52:45.580 --> 52:49.120]  Because you get down to a certain quanta in time
[52:49.120 --> 52:50.580]  and it doesn't make any sense anyway
[52:50.580 --> 52:53.440]  because you can't correlate it between systems that much.
[52:53.440 --> 52:55.840]  So, that is important.
[53:02.120 --> 53:05.120]  The question was for the three of us,
[53:05.120 --> 53:06.860]  or three of you, who...
[53:06.860 --> 53:10.260]  I'm only seeing one so far that's interested in the formulas.
[53:10.340 --> 53:11.840]  Will that be available online?
[53:11.840 --> 53:14.540]  And yes, I will put that up online.
[53:14.540 --> 53:19.360]  It will probably be a little while,
[53:19.360 --> 53:21.400]  maybe a couple weeks before I get around.
[53:21.420 --> 53:23.920]  I'm not going to be home for another week and a half.
[53:26.560 --> 53:28.520]  I will make it available online.
[53:28.520 --> 53:31.460]  It will be on the ISTS website off the IRIA.
[53:33.200 --> 53:38.020]  Both the presentation and some other papers we have on it.
[53:45.420 --> 53:49.720]  The question was, specify the amount of data to keep based on time
[53:49.720 --> 53:54.660]  and what does that look like based on the number of alerts on the IDS.
[53:55.520 --> 53:57.820]  To tell you the truth, I'm not really sure.
[53:57.820 --> 54:01.180]  I mean, it depends on how much is your network getting attacked.
[54:06.300 --> 54:11.220]  On our network, on the networks that we've been working on so far,
[54:11.220 --> 54:14.180]  we're talking usually on the order of...
[54:14.880 --> 54:17.300]  over a week period, a couple week periods,
[54:17.300 --> 54:20.400]  over the large amounts of networks,
[54:20.400 --> 54:23.540]  we've been looking on the orders of tens of thousands,
[54:23.540 --> 54:27.280]  but that's cumulative over the size of networks that we're working on.
[54:27.740 --> 54:29.820]  That's what we've been dealing with.
[54:29.820 --> 54:35.500]  That number will, of course, go up as it gets more widespread, larger networks.
[54:35.520 --> 54:39.440]  And it's going to be an interesting task to see where the balancing point is
[54:39.440 --> 54:44.480]  between how much information you need to be able to accurately track things
[54:44.480 --> 54:47.720]  and when do you get to the point where you get overwhelmed.
[54:53.130 --> 54:59.230]  Well, if that's about it, I guess the next thing for me to do
[54:59.230 --> 55:03.930]  is let you know that I believe it's Ian Goldberg is speaking in here next
[55:03.930 --> 55:06.370]  on arranging anonymous rendezvous,
[55:06.370 --> 55:09.930]  and then Jay Beal a bit later on about Bastille Linux.
[55:09.990 --> 55:11.570]  So, thank you all.
